Efficient SPARQL Query Processing via Map-Reduce-Merge
نویسنده
چکیده
The move towards a “semantic web” is driving the need for efficient querying ability over large datasets consisting of statements about web resources. RDF is a set of standards for describing and modeling data and is the backbone of the semantic web technologies. RDF datasets can be very large, and often are subject to complex queries with the intent of extracting and infering otherwise unseen connections within the data. MapReduce is a framework that allows for simplified development of programs for processing large data sets in a distrubuted, parallel, fault tolerant fashion. Map-Reduce provides many of the required features to support the type of querying needed in the semantic web, but historically has suffered from a lack of a natural way to process joins a critical component to RDF query processing. This paper presents a set of algorithms to support efficient processing of the core of SPARQL, an RDF query language, over an extension of Map-Reduce. A simple implementation of these algorithms is presented, and preliminary results are documented.
منابع مشابه
Query Performance Appraisal using SPARQL & Map Reduce Technique on Web Semantics
The Semantic Web is an emerging technology which aims at making data across the globe semantically connected. The data is represented in a very simple statement like construct having a subject, predicate and an object. This can be visualized as a graph with the subject and the object as nodes and the predicate as an edge connecting the two nodes. When many statements like these are collected to...
متن کاملEfficient SPARQL Query Evaluation via Automatic Data Partitioning
The volume of RDF data increases very fast within the last five years, e.g. the Linked Open Data cloud grows from 2 billions to 50 billions of RDF triples. With its wonderful scalability, cloud computing platform like Hadoop is a good choice for processing queries over large data sets. Previous works on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through c...
متن کاملCascading map-side joins over HBase for scalable join processing
One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins. Since Semantic Web datasets published in RDF have increased rapidly over the last few years, scalable join techniques become an important issue for SPARQL query processing as well. In this paper, we introduce the Map-Side Index Nested Loop Join (MAPSIN join) which combines scalable index...
متن کاملFederated SPARQL Query Processing Via CostFed
Efficient source selection and optimized query plan generation belong to the most important optimization steps in federated query processing. This paper presents a demo of CostFed, an index-assisted federation engine for federated SPARQL query processing. CostFed’s source selection and query planning is based on the index generated from the SPARQL endpoints. The key innovation behind CostFed is...
متن کاملRP-Filter: A Path-Based Triple Filtering Method for Efficient SPARQL Query Processing
With the rapid increase of RDF data, the SPARQL query processing has received much attention. Currently, most RDF databases store RDF data in a relational table called triple table and carry out several join operations on the triple tables for SPARQL query processing. However, the execution plans with many joins might be inefficient due to a large amount of intermediate data being passed betwee...
متن کامل